Automatically Discovering Local Visual Material Attributes1
نویسندگان
چکیده
Shape cues play an important role in computer vision, but shape is not the only information available in images. Materials, such as fabric and plastic, are discernible in images even when shapes, such as those of an object, are not. We argue that it would be ideal to recognize materials without relying on object cues such as shape. This would allow us to use materials as a context for other vision tasks, such as object recognition. In this paper, we derive a framework that allows us to discover locallyrecognizable material attributes from perceptual material distances. Discovered attributes exhibit the same desirable properties as known semantic material traits [1], despite the fact that they are discovered using only partial supervision. We obtain perceptual distances from a simple crowdsourcing process that can be readily extended to large datasets. Our attribute discovery process consists of three main components: perceptual distance measurement, attribute space definition, and attribute classifier learning. First, we measure pairwise perceptual distances between materials based on simple binary decisions on image patches collected through Amazon Mechanical Turk (AMT). Given these pairwise perceptual distances, we define a category-attribute space that preserves these distances while exhibiting desired attribute properties. Finally, we train a set of classifiers to reproduce the category-attribute space as predictions on image patches. Directly measuring perceptual material distances based on human annotations would be an ideal source of data. One could potentially ask annotators to give a measure of the difference between two images, but this would require some form of calibration to compare answers between different annotators. We instead propose to reduce the question to a binary one: “Do these patches look different or not?” The underlying assumption is that if two image patches look similar, they do so as a result of at least one shared visual material property. Discovering attributes given only a desired set of pairwise distances poses a challenge. A straightforward approach would be to directly train classifiers to predict attributes that encode the distances. This would be a highly under-constrained problem as we do not even know which attributes to associate with which categories. We instead separate the discovery process into two steps. First, we find a category-attribute mapping that encodes the desired distance matrix. We define the mapping as a category-attribute association matrix A. We constrain this mapping such that the resulting attributes are actually recognizable. Given a mapping, we may then define a set of attribute classifiers that reproduces this mapping on image patches. To discover realizable attributes, we encode our own prior knowledge that recognizable attributes exhibit a particular distribution and sparsity pattern. We observe that semantic attributes, specifically visual material traits, have a Beta-distributed association with material categories. Generally, a material category will either strongly exhibit a trait or it will not exhibit it at all. We would like the values in A to be Beta-distributed to match the that of known material trait associations. We achieve this by minimizing the KL divergence between the distribution of values in A (approximated by a kernel density estimate) and the desired Beta distribution. For attribute classification, we use a general two-layer non-linear model. This affords us the flexibility to place the correct constraints on the classifier learning process. This is essential given the lack of full supervision. For both attribute discovery and classifier learning, we formulate the problems as non-linear optimization over the space of model parameters (categoryattribute mapping and classifier weights). Figure 1 shows an example of three discovered attributes. We visualize the attributes on a sample image by predicting their per-pixel probabilities. We can clearly see that the attributes highlight distinct image regions. Figure 2 demonstrates that we do indeed discover an attribute space that separates material categories. Quantitatively, we verify that our discovered
منابع مشابه
Discovering Latent Graphs with Positive and Negative Links to Eliminate Spam in Adversarial Information Retrieval
This paper proposes a new direction in Adversarial Information Retrieval through automatically ranking links. We use techniques based on Latent Semantic Analysis to define a novel algorithm to eliminate spam sites. Our model automatically creates, suppresses, and reinforces links. Using an appropriately weighted graph spam links are assigned substantially lower weights while links to normal sit...
متن کاملA Novel Approach to Background Subtraction Using Visual Saliency Map
Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...
متن کاملRecognizing Material Properties from Images
Humans implicitly rely on properties of the materials that make up ordinary objects to guide our interactions. Grasping smooth materials, for example, requires more care than rough ones, and softness is an ideal property for fabric used in bedding. Even when these properties are not purely visual (softness is a physical property of the material), we may still infer the softness of a fabric by l...
متن کاملVisual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders
Perceptual experience consists of an enormous number of possible states. Previous fMRI studies have predicted a perceptual state by classifying brain activity into prespecified categories. Constraint-free visual image reconstruction is more challenging, as it is impractical to specify brain activity for all possible images. In this study, we reconstructed visual images by combining local image ...
متن کاملDiscovering Multi-Aspect Structure to Learn From Loosely Labeled Image Collections
Tremendous amounts of images readily available on the Web have simple annotations and tags. However, at best these tags can be considered as “loose labels”, since the richness of the visual information in the images far exceeds the description a few words can provide. While the coarse tags do not translate to a unified visual concept—making it difficult to directly use tagged images to train st...
متن کامل